CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction
نویسندگان
چکیده
BACKGROUND Culture-independent molecular surveys targeting conserved marker genes, most notably 16S rRNA, to assess microbial diversity remain semi-quantitative due to variations in the number of gene copies between species. RESULTS Based on 2,900 sequenced reference genomes, we show that 16S rRNA gene copy number (GCN) is strongly linked to microbial phylogenetic taxonomy, potentially under-representing Archaea in amplicon microbial profiles. Using this relationship, we inferred the GCN of all bacterial and archaeal lineages in the Greengenes database within a phylogenetic framework. We created CopyRighter, new software which uses these estimates to correct 16S rRNA amplicon microbial profiles and associated quantitative (q)PCR total abundance. CopyRighter parses microbial profiles and, because GCN estimates are pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN bias. Software validation with in silico and in vitro mock communities indicated that GCN correction results in more accurate estimates of microbial relative abundance and improves the agreement between metagenomic and amplicon profiles. Analyses of human-associated and anaerobic digester microbiomes illustrate that correction makes tangible changes to estimates of qPCR total abundance, α and β diversity, and can significantly change biological interpretation. For example, human gut microbiomes from twins were reclassified into three rather than two enterotypes after GCN correction. CONCLUSIONS The CopyRighter bioinformatic tools permits rapid correction of GCN in microbial surveys, resulting in improved estimates of microbial abundance, α and β diversity.
منابع مشابه
Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
The 16S ribosomal RNA gene is the most widely used marker gene in microbial ecology. Counts of 16S sequence variants, often in PCR amplicons, are used to estimate proportions of bacterial and archaeal taxa in microbial communities. Because different organisms contain different 16S gene copy numbers (GCNs), sequence variant counts are biased towards clades with greater GCNs. Several tools have r...
متن کاملLong-term forest soil warming alters microbial communities in temperate forest soils
Soil microbes are major drivers of soil carbon cycling, yet we lack an understanding of how climate warming will affect microbial communities. Three ongoing field studies at the Harvard Forest Long-term Ecological Research (LTER) site (Petersham, MA) have warmed soils 5°C above ambient temperatures for 5, 8, and 20 years. We used this chronosequence to test the hypothesis that soil microbial co...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملDosimetry limitations and pre-treatment dose profile correction for sliding window IMRT
Background: This work investigated the dosimetry limitations of the random and systematic uncertainties of sliding window (SW) intensity modulated radiation therapy (IMRT). Materials and Methods: A Varian 21EX linear accelerator, Pinnacle3 treatment planning system and radiographic film dosimetry was used. The limitations of the SW were studied using beam modulation ranging from 2 to 10...
متن کاملNuclear Architecture and Epigenetics of Lineage Choice
Differentiation is an epigenetic process which is installed by changes of transcriptional programs over successive cellular divisions. A number of studies have reported the effects of biochemical modifications of chromatin (DNA and chromatin proteins) on the regulation of transcription. Although, these studies are able to explain how transcription of a given gene is regulated (toward activation...
متن کامل